Goto

Collaborating Authors

 Stark County


ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects

Lee, Woojin, Chang, Hyugjae, Moon, Jaeho, Lee, Jaehyup, Kim, Munchurl

arXiv.org Artificial Intelligence

Weakly supervised oriented object detection (WS-OOD) has gained attention as a cost-effective alternative to fully supervised methods, providing both efficiency and high accuracy. Among weakly supervised approaches, horizontal bounding box (HBox)-supervised OOD stands out for its ability to directly leverage existing HBox annotations while achieving the highest accuracy under weak supervision settings. This paper introduces adaptive bounding box scaling and symmetry-prior-based orientation prediction, called ABBSPO, a framework for WS-OOD. Our ABBSPO addresses limitations of previous HBox-supervised OOD methods, which compare ground truth (GT) HBoxes directly with the minimum circumscribed rectangles of predicted RBoxes, often leading to inaccurate scale estimation. To overcome this, we propose: (i) Adaptive Bounding Box Scaling (ABBS), which appropriately scales GT HBoxes to optimize for the size of each predicted RBox, ensuring more accurate scale prediction; and (ii) a Symmetric Prior Angle (SPA) loss that exploits inherent symmetry of aerial objects for self-supervised learning, resolving issues in previous methods where learning collapses when predictions for all three augmented views (original, rotated, and flipped) are consistently incorrect. Extensive experimental results demonstrate that ABBSPO achieves state-of-the-art performance, outperforming existing methods.


Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Shandilya, Utkarsh, Kappan, Marsha Mariya, Jain, Sanyam, Sharma, Vijeta

arXiv.org Artificial Intelligence

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.


Active Retrieval Augmented Generation

Jiang, Zhengbao, Xu, Frank F., Gao, Luyu, Sun, Zhiqing, Liu, Qian, Dwivedi-Yu, Jane, Yang, Yiming, Callan, Jamie, Neubig, Graham

arXiv.org Artificial Intelligence

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout generation is essential. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.


De-confounding Representation Learning for Counterfactual Inference on Continuous Treatment via Generative Adversarial Network

Zhao, Yonghe, Huang, Qiang, Zeng, Haolong, Pen, Yun, Sun, Huiyan

arXiv.org Artificial Intelligence

Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.


Employing Drones in Agriculture: An Exploration of Various Drone Types and Key Advantages

Nunes, E. C.

arXiv.org Artificial Intelligence

This article explores the use of drones in agriculture and discusses the various types of drones employed for different agricultural applications. Drones, also known as unmanned aerial vehicles (UAVs), offer numerous advantages in farming practices. They provide real-time and high-resolution data collection, enabling farmers to make informed irrigation, fertilization, and pest management decisions. Drones assist in precision spraying and application of agricultural inputs, minimizing chemical wastage and optimizing resource utilization. They offer accessibility to inaccessible areas, reduce manual labor, and provide cost savings and increased operational efficiency. Drones also play a crucial role in mapping and surveying agricultural fields, aiding crop planning and resource allocation. However, challenges such as regulations and limited flight time need to be addressed. The advantages of using drones in agriculture include precision agriculture, cost and time savings, improved data collection and analysis, enhanced crop management, accessibility and flexibility, environmental sustainability, and increased safety for farmers. Overall, drones have the potential to revolutionize farming practices, leading to increased efficiency, productivity, and sustainability in agriculture.


'MLB The Show 17': 3 Examples Of DLC That Would Be Worth Buying

Forbes - Tech

The MLB The Show 17 series has never been big on downloadable content. That's something most fans of the series are probably thrilled about. Who wants to pay extra for more content? However, we know business is about adding revenue, and oftentimes that's accomplished with product expansion. Other sports video games have created a way to make an additional fortune through microtransactions that deliver new access to in-game features and boots.